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Abstract 

Segmentation  involves  separating  an  object  from  the 
background.  In  this  work,  we  propose  a  novel  segmenta¬ 
tion  method  combining  image  information  with  prior  shape 
knowledge,  within  the  lev  el- set  framework.  Following  the 
work  of  Lev  enton  et  al.,  we  revisit  the  use  of  principal  com¬ 
ponent  analysis  (PCA)  to  introduce  prior  knowledge  about 
shapes  in  a  more  robust  manner.  To  this  end,  we  utilize 
Kernel  PCA  and  show  that  this  method  of  learning  shapes 
outperforms  linear  PCA,  by  allowing  only  shapes  that  are 
close  enough  to  the  training  data.  In  the  proposed  segmen¬ 
tation  algorithm,  shape  knowledge  and  image  information 
are  encoded  into  two  energy  functionals  entirely  described 
in  terms  of  shapes.  This  consistent  description  allows  to 
fully  take  advantage  of  the  Kernel  PCA  methodology  and 
leads  to  promising  segmentation  results.  In  particular,  our 
shape-driven  segmentation  technique  allows  for  the  simul¬ 
taneous  encoding  of  multiple  types  of  shapes,  and  offers  a 
convincing  level  of  robustness  with  respect  to  noise,  clutter, 
partial  occlusions,  or  smearing. 


1.  Introduction 

Segmentation  consists  of  extracting  an  object  from  an 
image,  an  ubiquitous  task  in  computer  vision  applications. 
It  is  quite  useful  in  applications  ranging  from  finding  special 
features  in  medical  images  to  tracking  deformable  objects; 
see  [7,  14,  16,  17]  and  the  references  therein.  The  active 
contour  methodology  has  proven  to  be  quite  valuable  for 
performing  this  task.  However,  the  use  of  image  informa¬ 
tion  alone  often  leads  to  poor  segmentation  results  in  the 
presence  of  noise,  clutter  or  occlusion.  The  introduction 
of  shape  priors  in  the  contour  evolution  process  has  been 
shown  to  be  an  effective  way  to  address  this  issue,  leading 
to  more  robust  segmentation  performances. 

Many  different  methods  which  use  a  parameterized  or 
an  explicit  representation  for  contours  have  been  proposed 


[2,  15,  3].  In  [4],  the  authors  use  the  B-spline  parametriza- 
tion  to  build  shape  models  in  the  kernel  space  [8].  These 
models  were  then  used  in  the  segmentation  process  to  pro¬ 
vide  shape  prior.  The  geometric  active  contour  framework 
(GAC)  (see  [12]  and  the  references  therein)  involves  a  para¬ 
meter  free  representation  of  contours,  i.e.,  a  contour  is  rep¬ 
resented  implicitly  by  the  zero  level  set  of  a  higher  dimen¬ 
sional  function,  typically  a  signed  distance  function  [9] .  In 
[7],  the  authors  obtain  the  shape  statistics  by  performing 
linear  principal  component  analysis  (PCA)  on  a  training  set 
of  signed  distance  functions  (SDFs).  This  approach  was 
shown  to  be  able  to  convincingly  capture  small  variations 
in  the  shape  of  an  object.  It  inspired  other  schemes  to  ob¬ 
tain  shape  prior  described  in  [14,  11],  notably,  where  SDFs 
were  used  to  learn  the  shape  variations. 

However,  when  the  object  considered  for  learning  may 
undergo  complex  or  non-linear  deformations,  linear  PCA 
can  lead  to  unrealistic  shape  priors,  by  allowing  linear  com¬ 
binations  of  the  learnt  shapes  that  are  unfaithful  to  the  true 
shape  of  the  object.  Cremers  et  al .,  successfully  pioneered 
the  use  of  kernel  methods  to  address  this  issue,  within  the 
GAC  framework  [5].  In  the  present  work,  we  propose  to 
use  Kernel  PCA  to  introduce  shape  priors  for  GACs.  Ker¬ 
nel  PCA  was  presented  by  Scholkopf  [8]  and  allows  to  com¬ 
bine  the  precision  of  kernel  methods  with  the  reduction  of 
dimension  in  the  training  set.  This  is  the  first  time,  to  our 
knowledge,  that  Kernel  PCA  is  explicitly  used  to  introduce 
shape  priors  in  the  GAC  framework.  In  this  paper,  we  also 
propose  a  novel  intensity  segmentation  method,  specifically 
tailored  to  allow  for  the  inclusion  of  shape  prior. 

In  the  next  section,  we  compare  linear  PCA  to  Kernel 
PCA,  using  SDFs  and  binary  maps  as  representations  of 
shapes.  In  Section  3,  we  propose  an  intensity-based  en¬ 
ergy  functional  in  terms  of  binary  shapes  for  separating  an 
object  from  the  background,  in  an  image.  These  energies 
are  qualitatively  similar  to  the  ones  proposed  by  [1,  10]  but 
quantitatively  different.  In  Section  4,  we  present  a  robust 
segmentation  framework,  combining  image  cues  and  shape 
knowledge  in  a  consistent  fashion.  The  robustness  of  the 
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proposed  algorithm  is  demonstrated  on  various  challenging 
examples,  in  Section  5. 

2.  Kernel  PCA  for  shape  prior 

Kernel  PCA  can  be  considered  to  be  a  generalization  of 
linear  principal  components  analysis.  This  technique  was 
introduced  by  Scholkopf  [8],  and  has  proven  to  be  a  pow¬ 
erful  method  to  extract  nonlinear  structures  from  a  data  set. 
The  idea  behind  Kernel  PCA  consists  of  mapping  a  data  set 
from  an  input  space  X  into  a  feature  space  F  via  a  nonlinear 
function  <p.  Then,  PCA  is  performed  in  F  to  find  the  orthog¬ 
onal  directions  (principal  components)  corresponding  to  the 
largest  variation  in  the  mapped  data  set.  The  first  l  principal 
components  account  for  as  much  of  the  variance  in  the  data 
as  possible  by  using  l  directions.  In  addition,  the  error  in 
representing  any  of  the  elements  of  the  training  set  by  its 
projection  onto  the  first  l  principal  components  is  minimal 
in  the  least  square  sense. 

The  nonlinear  map  ip  typically  does  not  need  to  be 
known,  through  the  use  of  Mercer  kernels.  A  Mercer  kernel 
is  a  function  fc(., .)  such  that  for  all  data  points  Xi,  the  ker¬ 
nel  matrix  K (i,j)  =  k(xi,Xj)  is  symmetric  positive  def¬ 
inite  [8].  It  can  be  shown  that  using  fc(., .)  one  can  obtain 
the  inner  scalar  product  in  F:  k(\a,  Xb)  =  (<fi(Xa)  ■  v{Xb)), 
with  ( Xa,Xb )  e  I. 

We  now  briefly  describe  the  Kernel  PCA  method  [6,  8]. 
Let  r  =  {xi,  X2,  •  ••,  Xn}  be  a  set  of  training  data.  The 
centered  kernel  matrix  K  corresponding  to  r,  is  defined  as 


Let  x  be  any  element  of  the  input  space  X.  The  projection 
of  x  on  the  Kernel  PCA  space  will  be  denoted  by  Pl<p{x )  1  • 
The  projection  Plp{x)  can  he  obtained  as  given  in  [8].  In 
the  feature  space  F,  the  squared  distance  d2F  between  a  test 
point  x  and  its  projection  on  the  Kernel  PCA  space  is  given 
by  [8]: 

4b(x),^V(x)]  =  II  v>(x)--pV(x)  ll2= 

Kx,x)  -  Mx)^V(x)  +  pV(x)^V(x) 

Using  some  matrices  manipulations,  this  squared  distance 
can  be  expressed  only  in  terms  of  kernels  as: 

4#(x),-pV(x)]  = 

HX,  X)  +  L] I‘K1  -  |Xkx  +  kxMKMkx  -  2kxMkx 

(3) 

where,  kx  =  [k(x,Xi)  *(X,X 2)  , &(x, Xn)}\  kx  = 
H(kx  -  £K1)  and  M  =  E-=i  ^  < 

2.1.  Kernel  for  linear  PCA 

In  [7],  the  authors  presented  a  method  to  learn  shape 
variations  by  performing  PCA  on  a  training  set  of  shapes 
(closed  curves)  represented  as  the  zero  level  sets  of  signed 
distance  functions.  Using  the  following  kernel  in  the  for¬ 
mulation  of  Kernel  PCA  presented  above,  amounts  to  per¬ 
forming  Linear  PCA  on  SDFs  2: 


K  =  (<p(Xi)  ~  V  ■  v(Xj)  ~  <f) 

=  (<p(Xi)  ■lf(Xj))  =  KXi,Xj),  fori  E  [|  1, N\] 

with  (p  =  ±  E4 Ii  <P(Xi)  .  <P(Xi)  =  ¥>(Xi)  ~  <P  being  the 
centered  map  corresponding  to  Xi  an<3  &(•?•)  denotes  the 
centered  kernel  function.  Since  K  is  symmetric,  using  Sin¬ 
gular  Value  Decomposition,  it  can  be  decomposed  as 

K  =  USU*  (2) 

where  S  =  diag(yi, ...,  7^)  is  a  diagonal  matrix  containing 
the  eigenvalues  of  K.  U  =  [ui, ...,  un]  is  an  orthonormal 
matrix.  The  column- vectors  =  [un: ...,  are  the 
eigenvectors  corresponding  to  the  eigenvalues  7^’s.  Besides 
it  can  easily  be  shown  that  K  =  HKH,  where  H  =  I  — 
^11*.  1  =  [1, ...,  1 Y  is  an  TV  x  1  vector. 

Let  C  denote  the  covariance  matrix  of  the  elements  of  the 
training  set  mapped  by  (p.  Within  the  Kernel  PCA  method¬ 
ology,  C  does  not  need  to  be  computed  explicitly,  only  K 
needs  to  be  known  to  extract  features  from  the  training  set 
[13].  The  subspace  of  the  feature  space  F  spanned  by  the 
first  l  eigenvectors  of  C,  will  be  referred  to  as  the  Kernel 
PCA  space,  in  what  follows.  The  Kernel  PCA  space  is  the 
subspace  of  F,  obtained  from  learning  the  training  data. 


=  j  J  $i(u,v)$j(u,v)du.dv 


(4) 


for  all  SDFs  <17  and  T7  :  R2  1— »  R. 

A  different  representation  for  shapes  is  to  use  binary 
maps,  i.e.,  to  set  to  1  the  pixels  located  inside  the  shape 
and  to  0  the  pixels  located  outside  (see  figure  1).  One  can 
change  the  shape  representation  from  SDFs  to  binary  maps 

{1  <f>  >  0  , 

0  else  . 

Note  that,  in  this  case,  the  kernel  allowing  to  perform 
linear  PCA  is  given  by  H$j)  =  (i7T>i.i7T>J). 

2.2.  Kernel  for  nonlinear  PCA 

Choosing  a  nonlinear  kernel  function  fc(., .)  leads  to  per¬ 
forming  nonlinear  PCA.  The  exponential  kernel  has  been  a 
popular  choice  in  the  machine  learning  community  and  has 

1  In  this  notation  l  refers  to  the  first  l  eigenvectors  of  C  used  to  build  the 
Kernel  PCA  space. 

2 id  here  stands  for  the  identity  function:  when  performing  linear  PCA 
the  kernel  used  is  the  inner  scalar  product  in  input  space,  hence  the  corre¬ 
sponding  mapping  function  Lp  =  id. 


proven  to  nicely  extract  nonlinear  structures  from  data  sets. 
Using  SDFs  for  representing  shapes,  this  kernel  is  given  by 

=  e  2-2  ,  (5) 

where  a2  is  the  variance  parameter  computed  a-priori  and 
\\&i  —  Qj  || 2  is  the  squared  L2-distance  between  two  SDFs 
and  If  the  shapes  are  represented  by  binary  maps, 
the  corresponding  kernel  is 

(H$i  ,H^)=e  ^  .  (6) 

This  exponential  kernel  is  one  among  many  possible  choice 
of  Mercer  kernels:  Other  kernels  could  possibly  be  used  to 
extract  other  specific  features  from  the  training  set  [8]. 

2.3.  Shape  Prior  for  GAC 

To  include  prior  knowledge  on  shape  in  the  GAC  frame¬ 
work,  we  propose  to  use  the  projection  on  the  Kernel  PC  A 
space  as  a  model  and  to  minimize  the  following  energy: 

Efhapeix)  ■=  d2F[ip(x),Plv(x)\  (7) 

A  similar  idea  was  proposed  in  [13,  8],  for  the  purpose  of 
pattern  recognition.  In  (7),  %  is  a  test  shape  represented  us¬ 
ing  either  a  SDF  (x  =  </>)  or  a  binary  map  (x  38  H(j))  and  p 
refers  to  either  id  (linear  PC  A)  or  (Kernel  PC  A).  Mini¬ 
mizing  Efh  amounts  to  driving  the  test  shape  x  towards 
the  Kernel  PCA  space  computed  a  priori  from  a  training  set 
of  shapes  using  (2).  In  the  GAC  framework,  the  minimiza¬ 
tion  of  £ghape(x),  can  be  undertaken  as  follows: 


differences  between  them  due  to  Euclidian  transformations. 
The  Kernel  PCA  space  corresponding  to  each  of  the  kernels 
presented  in  Sections  2.1  and  2.2  were  then  built  for  the  two 
training  sets.  Starting  from  an  arbitrary  shape,  Figure  2(a), 
the  contour  was  deformed  by  running  equation  (8)  until  con¬ 
vergence:  We  will  refer  to  this  operation  as  “morphing”,  in 
what  follows. 

Figure  2(b)  shows  the  morphing  results  obtained  by  ap¬ 
plying  linear  PCA  on  SDF.  Figure  2(d)  shows  the  morphing 
results  obtained  by  applying  linear  PCA  on  binary  maps. 
As  can  be  noticed,  results  obtained  for  the  SDF  representa¬ 
tion  bear  little  resemblance  with  the  elements  of  the  training 
sets.  Results  obtained  for  binary  maps  are  more  faithful  to 
the  learnt  shapes.  Figure  2(c)  and  (e)  present  the  morphing 
results  obtained  by  applying  nonlinear  PCA  on  SDF  and  bi¬ 
nary  maps,  respectively.  In  both  cases,  the  final  contour  is 
very  close  to  the  training  set  and  results  are  better  than  any 
of  the  results  obtained  with  linear  PCA. 

Hence,  Kernel  PCA  outperforms  linear  PCA  as  a  means 
to  introduce  shape  priors  and  binary  maps  seem  to  be  an 
efficient  shape  representation.  Besides,  the  learning  process 
using  Kernel  PCA  comes  with  no  significant  additional  cost 
compared  to  linear  PCA,  thanks  to  the  kernel  formulation 
[8,  13].  Another  advantage  of  using  the  exponential  kernel, 
is  that  it  enables  to  control  the  degree  to  which  “mixing”  is 
allowed  between  the  learnt  shapes,  in  the  shape  prior,  larger 
cr’s  allowing  more  mixing.  This  is  shown  in  Figure  3.  The 
choice  of  cr  typically  depends  on  how  much  the  shapes  vary 
within  the  data  set:  If  the  variation  is  large,  a  smaller  value 
for  cr  is  usually  preferable. 


dt 


=  -V<J£ 


^^shape 


~vx 


pF  ^X 

Shape'd<t> 


(8) 


The  gradient  of  Ejhape  can  be  computed  by  applying  cal¬ 
culus  of  variation  and  (3).  For  the  kernel  given  in  (6),  the 
following  result  is  obtained: 


„  pf  X?=M<l>)k<pAH<i>,H<t>i)5{<l>)[H<l>  -  H<t>j] 

'^^shape  0.2 

with  . ^jv(0)]  =  -&1*  +  2k^MKMH  - 

4k^MH 

2.3.1  Linear  PCA  vs  Kernel  PCA 

In  this  section  we  compare  linear  PCA  with  non-linear  PCA 
for  two  different  representations  of  shapes,  i.e.,  SDF  and  bi¬ 
nary  map.  Two  training  set  of  shapes  were  used:  The  first 
training  set  consists  of  various  shapes  of  a  man  playing  soc¬ 
cer  and  the  second  training  set  consists  of  various  shapes 
of  a  shark  (see  Figure  1).  These  shapes  were  aligned  us¬ 
ing  an  appropriate  registration  scheme  (see,  [14])  to  discard 


3.  Intensity  based  segmentation 

Different  models  [18,  1,  10],  which  incorporate  geomet¬ 
ric  and/or  photometric  (color,  texture,  intensity)  informa¬ 
tion,  have  been  proposed  to  perform  region  based  segmen¬ 
tation  using  level  sets.  In  what  follows,  we  present  a  novel 
intensity  based  segmentation  framework  aimed  at  separat¬ 
ing  an  object  from  the  background,  in  an  image  I.  The 
main  idea  behind  the  proposed  method  is  to  build  an  “im¬ 
age  shape  model”  (denoted  by  G[/?$])  by  thresholding  the 
image  I  based  on  the  estimates  of  the  intensity  statistics  of 
the  object  (and  background),  available  at  each  step  t  of  the 
contour  evolution:  G [j?$]  is  interpreted  as  the  most  likely 
shape  of  the  object  of  interest,  based  on  the  available  in¬ 
formation.  The  contour  at  time  t  is  deformed  towards  this 
“image  shape  model”  by  minimizing  the  following  energy: 


£image  “  ||^  -  <?[/,$]  ||2  =  [  (H$  -  G[7,*])2  dxdy. 

Jn 


(9) 

This  energy  functional  amounts  to  measuring  the  distance 
between  two  binary  maps,  e.g.:  H and  G[/?$].  This  is 
quite  valuable  in  the  present  context,  where  shapes  are  rep- 


Figure  1.  Three  training  sets  (Before  alignment  -  Binary  images  are  presented  here).  First  row,  “Soccer  Player”  silhouettes  (6  of  the  22 
used).  Second  row,  “Shark”  silhouettes  (6  of  the  15  used).  Third  row,  “4  words”  (6  of  the  80  learnt;  20  fonts  per  word) 


(a)  (b)  (c)  (d)  (e) 

Figure  2.  Morphing  results  of  an  arbitrary  shape,  obtained  using  Linear  PCA  and  Kernel  PCA  applied  on  both  Signed  Distance  Functions 
and  binary  maps.  First  row:  Results  for  the  “Soccer  Player”  training  set,  Second  row:  Results  for  the  “Shark”  training  set.  (a):  Initial 
shape,  (b):  PCA  on  SDF,  (c):  Kernel  PCA  on  SDF  (d):  PCA  on  binary  maps,  (e):  Kernel  PCA  on  binary  maps. 


resented  using  binary  maps  as  in  earlier  sections.  Thus, 
when  the  shape  energy  term  described  before  is  combined 
with  the  following  formulation  for  image  segmentation,  all 
the  elements  are  expressed  in  terms  of  shapes.  This  is  one 
of  the  unique  contributions  in  this  work.  In  what  follows, 
we  describe  two  particular  cases  of  this  general  framework. 


Notice  that  G[j?$]  is  the  image  shape  model  (binary  map) 
obtained  from  thresholding  the  image  intensities  so  that  val¬ 
ues  closer  to  fi\  are  classified  as  object  (set  to  1)  and  others 
are  classified  as  background  (set  of  0).  For  numerical  ex¬ 
periments,  the  function  G[j}$]  is  calculated  as  follows: 


3.1.  Object  and  background  with  different  mean 
intensities 


if  >  /i2;  G[/5$5£] 


-4 —  arctan 
2  7 r 


M1+M2  \ 


As  in  [1,  18],  we  assume  that  the  image  is  composed  of 
two  regions  having  different  intensity  means:  fi0  (respec¬ 
tively  fib)  is  the  mean  intensity  of  the  object  (respectively 
of  the  background).  Given  an  initial  guess  for  the  shape 
of  the  object  and  representing  the  contour  as  the  zero  level 
set  of  a  SDF  <F,  one  can  calculate  the  mean  intensity  inside 
(£i l)  and  outside  (  n2)  the  curve  as  m  =  J  %^fydy 

and  ft 2  =  £  I(j(i^HV)dldydV  •  The  Soal  is  t0  deform  this 
initial  contour  so  that  fii  =  fi0  and  /12  =  fib-  To  achieve 
this,  the  “image  shape  model”  G[/}$]  is  generated  at  each 
step  t,  in  the  following  manner: 


if  fii  >  ft2,  G[!^](x,y)  = 


if  m  <  fj.2,  G[!^](x,y)  = 


1  I(x,y)>^ 

0  else  . 

' 1  I(x,y)<^ 

0  else  . 


else 


G[I,$,e] 


1  1 

-  arctan 

2  7 r 


M1+/X2  \ 


where  e,  a  parameter  such  that  — > ►  G[j as  5  — >  0. 


3.2.  Object  and  background  with  different  vari¬ 
ances 

Following  [10],  we  assume  that  the  image  is  composed 
of  two  regions,  with  different  variance  in  intensity.  The 
mean  intensities  are  computed  as  before,  while  the  vari¬ 
ances  inside  (cri)  and  outside  (02)  the  curve  are  com¬ 
puted  as  follows:  a\  =  ^  ^  f  h$  d xdy  ^  an<^  a2  = 
^  dXdV  •  case’  ima§e  shape  model 


(a)  (b)  (c)  (d)  (e) 

Figure  3.  Influence  of  a  for  the  Kernel  PC  A  method  (exponential  kernel)  applied  on  binary  maps.  Morphing  results  of  an  arbitrary  shape 
are  presented  for  the  “Shark”  training  set.  (a):  Initial  shape,  (b):  Morphing  result  for  a  =  3,  (c):  a  =  7,  (d):  a  —  9,  (e):  a  =  15,. 


G[i^]  is  obtained  as  follows: 


if  (Ji  <  <72,  G[j5$]  — 
if  <71  >  <72,  G[/5$]  = 


h  >  I(x,y)  >  h  ; 

else  . 

I(x,y)  >  h  or  I{x,y)  <  I2  ; 
else  . 


where, 

a\fi%  -  (j\ii2  -ol  a\fi\  -  (j\ii2  +  a 

h  -  - -o - -o - and  12  -  - ~o - ~o - , 


cr  i. 


a  =  <72<7i  W  (/ii  -  /i2)2  +  2(cr2  -  <j|)log(  — ) 


CT2 


This  thresholding  ensures  that  pixels  set  to  1  in  Gj/^j 
correspond  to  pixels  that  are  more  likely  to  belong  to  the 
object  of  interest  in  the  image,  based  on  information  avail¬ 
able  at  step  t.  In  the  same  way,  pixels  set  to  0  in  Gj/^j 
correspond  to  pixels  that  are  more  likely  to  belong  to  the 
background.  Figure  4  shows  the  different  cases  justifying 
the  way  thresholding  is  performed  in  equation  (10). 

In  numerical  applications,  the  binary  map  Gp^j  in  (10) 
is  computed  as  follows  (for  6  small): 


if  <7 1  <  cr2; 

G[/,$,£]  =  —  arctan 


I-h 


1 

- arctan 

7 T 


I-h 


Figure  4.  Probability  density  functions.  Thick  line:  p(I  G  Object); 
Thin  line:  p(I  G  background).  It  is  straightforward  to  see  that 
p(I  G  Object)  >  p(I  G  Object)  for  Ii  <  I  <  h,  when  <j\  <  02 
and  for  I  >  Ii  and  I  <  h ,  when  a±  >  02. 


Figure  5.  Segmentation  results  obtained  using  F?image,  equation 
(9).  Initial  contour  in  black,  final  contour  in  white.  Left:  1st  mo¬ 
ment  only.  Right:  second  order  moments  (Two  regions  of  same 
mean  intensity  and  different  variances) 


distance  in  input  space,  whereas  Efha pe  is  expressed  in  the 
feature  space.  Thus,  equilibrium  would  be  hard  to  reach  be¬ 
tween  “forces”  extracted  from  Efha pe  and  E’image  3-  This 
can  be  remedied  by  noticing  that,  for  any  SDFs  and  0^: 


else 


G[/,$j£]  =  1 - arctan  f - - 


1 

—  arctan 

7T 


I-h 


Figure  5,  presents  results  obtained  for  each  of  the  image 
shape  models  presented  above. 


4.  Combining  Shape  Prior  and  Intensity  infor¬ 
mation 


||  H<&a-H<f>b\\2 

d2F((pa,<t>b)  =  2-2  kVa(H(f>a,H(f>a)  =  2  — 2e  5^ 

By  defining  iJshape  =  — 2a2log(2  £|hape),  a  new  shape 
prior  energy  functional  is  obtained4.  This  energy  T^hape* 
like  -£/image ,  is  homogeneous  to  a  square  distance  in  input 
space.  This  consistent  description  of  energies  allows  for 
efficient  and  intuitive  equilibration  between  image  cues  and 
shape  knowledge,  through  the  following  energy  functional: 


In  this  part,  we  combine  shape  knowledge  obtained  by 
performing  nonlinear  PCA  on  binary  maps  with  image  in¬ 
formation  obtained  by  building  an  “image  shape  model”, 
within  the  GAC  framework.  As  presented  above,  Efhape 
and  -E/image  are  squared  distances  between  the  shape  of  the 
current  contour  and  a  model.  However,  ^image  is  a  squared 


£($,  I)  =  fa  £shape($)  +  fa  Tima ge($,  I)  (10) 


3  V$^^ape  would,  indeed,  exhibit  nonlinear  behaviors  due  to  the  ex¬ 
ponential  terms  figuring  in  its  expression 

4By  applying  the  chain  rule,  one  can  verify  that  V^Kjhape  and 
V^-E^iape  have  the  same  direction  and  similar  influence  on  the  evolution. 


4.1.  Invariance  to  Similarity  Transformations: 

Let  p  =  [tx,ty,0,p]  =  \puP2,P3,Pa\  be  a  vector  of 
parameters  corresponding  to  a  similarity  transformation;  tx 
and  ty  corresponding  to  translation  according  to  x  and  y- 
axis,  0  being  the  rotation  angle  and  p  the  scale  parame¬ 
ter.  Let  us  denote  by  I(x,  y)  the  image  obtained  by  apply¬ 
ing  the  transformation:  I(x,y)  =  I(p(xcosO  —  y sin#  + 
tx),  p(xsin6  +  ycosO  +  ty)).  As  mentioned  above,  the 
elements  of  the  training  sets  are  aligned  prior  to  the  con¬ 
struction  of  the  space  of  shapes.  Supposing  that  the  object 
of  interest  in  I  differs  from  the  registered  elements  of  the 
training  set  by  a  similarity  of  parameter  p,  this  transforma¬ 
tion  can  be  recovered  by  minimizing  E(<&,  I)  with  respect 
to  the p^’s.  During  evolution,  the  following  gradient  descent 
scheme  can  be  performed  for  i  e  [1,4]: 

^  =  -vPi£(<M)  =  -vPi£image(<M). 

5.  Experiments 

This  section  presents  segmentation  results  obtained  by 
introducing  shape  prior  using  Kernel  PCA  on  binary  maps 
and  using  our  intensity  based  segmentation  methodology: 
Equation  (10)  was  run  until  convergence  on  diverse  images. 

5.1.  Toy  Example:  Shape  Priors  Involving  Objects 

of  Different  Types. 

Kernel  methods  have  been  used  to  learn  complex  multi¬ 
modal  distributions  in  an  unsupervised  fashion  (see  [8],  and 
the  references  therein).  The  goal  of  this  section  is  to  inves¬ 
tigate  the  ability  of  the  proposed  framework  to  simultane¬ 
ously  learn  and  accurately  detect  objects  of  different  shapes. 
To  this  end,  we  built  a  training  set  consisting  of  four  words, 
“orange”,  “yellow”,  “square”  and  “circle”  each  written  us¬ 
ing  twenty  different  fonts.  The  size  of  the  fonts  was  chosen 
to  lead  to  words  of  roughly  the  same  length.  The  obtained 
words  (binary  maps,  see  Figure  1)  were  then  registered  ac¬ 
cording  to  their  centroid.  No  further  effort  such  as  matching 
the  letters  of  the  different  words  was  pursued.  The  method 
presented  in  Section  (2)  was  used  to  build  the  corresponding 
space  of  shapes  for  the  registered  binary  maps. 

We  tested  our  framework  on  images  where  a  corrupted 
version  of  either  of  the  four  words  “orange”,  “yellow”, 
“square”  or  “circle”  was  present  (Figure  6,  1st  row).  Word 
recognition  is  a  challenging  task  and  addressing  it  using 
geometric  active  contours  may  not  be  a  panacea.  How¬ 
ever,  the  ability  of  the  level  set  representation  to  naturally 
handle  topological  changes  was  found  to  be  useful  for  this 
purpose:  During  evolution,  the  contour  split  and  merged  a 
certain  number  of  times  to  segment  the  disconnected  letters 
of  the  words.  In  all  the  following  experiments,  (3\  and  /?2 
were  fixed  in  (10)  and  the  same  initial  contour  was  used. 


Experiment  1:  In  this  experiment,  one  of  the  words 
“square”  belonging  to  the  training  set  was  corrupted:  The 
letter  “u”  was  almost  completely  erased.  The  shape  thus 
obtained  was  filled  with  gaussian  noise  of  mean  pQ  =  .5 
and  variance  aQ  =  .05.  The  background  was  also  filled 
with  Gaussian  noise  of  same  mean  p,b  =  .5  but  of  variance 

=  .2.  The  result  of  applying  our  method  is  presented 
Figure  6(a).  Despite  the  noise  and  the  partial  deletion,  a 
very  convincing  segmentation  is  obtained.  In  particular,  the 
correct  font  is  detected  and  the  letter  “u”  accurately  recon¬ 
structed.  In  addition,  the  final  curve  is  smooth  even  if  no 
curvature  term  was  used  for  regularization.  Hence,  using 
binary  maps  to  represent  shape  priors  can  have  valuable 
smoothing  effects,  even  when  dealing  with  noisy  images. 

Experiment  2:  In  this  second  experiment,  one  of  the  el¬ 
ements  of  the  training  set  was  used.  A  thick  line  (occlu¬ 
sion)  was  drawn  on  the  word  and  a  fair  amount  of  gaussian 
noise  was  added  to  the  resulting  image.  The  result  of  ap¬ 
plying  our  method  is  presented  Figure  6(b).  Despite  the 
noise  and  the  occlusion,  a  very  convincing  segmentation  is 
obtained.  In  particular,  the  correct  font  is  detected  and  the 
thick  line  completely  removed.  Once  again,  the  final  con¬ 
tour  is  smooth  despite  the  fairly  large  amount  of  noise. 

Experiment  3:  Here,  the  word  “yellow”  was  written  us¬ 
ing  a  different  font  from  the  ones  used  to  build  the  train¬ 
ing  set.  Additionally,  a  “linear  shadowing”  was  used  in  the 
background  (completely  hiding  the  letter  ”y”)  and  the  letter 
”w”  was  replaced  by  a  grey  square.  The  result  of  apply¬ 
ing  our  framework  is  presented  in  Figure  6(c).  The  word 
“yellow”  is  correctly  recognized  and  segmented.  Also,  the 
letters  “y”  and  ”w”,  were  completely  reconstructed. 

Experiment  4:  In  this  experiment,  the  word  “orange”  was 
handwritten  in  capital  letters  roughly  matching  the  size  of 
the  letters  of  the  words  in  the  training  set.  The  intensity  of 
the  letters  was  chosen  to  be  rather  close  to  some  parts  of  the 
background.  In  addition,  the  word  was  blurred  and  smeared 
in  a  way  that  made  its  letters  barely  recognizable.  This  type 
of  blurring  effect  is  often  observed  in  medical  images  due 
to  patient  motion.  This  image  is  particularly  difficult  to  seg¬ 
ment,  even  using  shape  prior,  since  the  spacing  between 
letters  and  the  letters  themselves  are  very  irregular  due  to 
the  combined  effects  of  handwriting  and  blurring.  Hence, 
mixing  between  classes  (confusion  between  either  of  the  4 
words)  can  be  expected  in  the  final  result.  In  the  final  result 
obtained,  the  word  “orange”  is  not  only  recognized  but  sat- 
isfyingly  recovered;  in  particular,  a  thick  font  was  obtained 
to  model  the  thick  letters  of  the  word  (  Figure  6(d)). 

Hence,  starting  for  each  experiment  from  the  same  initial 
contour,  our  algorithm  was  able  to  accurately  detect  which 
word  was  present  in  the  image.  This  highlights  the  ability  of 
our  method  not  only  to  gather  image  information  through¬ 
out  evolution  but  also  to  distinguish  between  objects  of  dif¬ 
ferent  classes  (“orange”,  “yellow”,  “square”  and  “circle”). 


Comparing  the  final  contours  obtained  in  each  experiments 
to  the  final  “image  shape  model”  G [/,(/>]  (last  row  of  Fig¬ 
ure  6),  one  can  measure  the  effect  of  our  shape  prior  model 
in  constraining  the  contour  evolution:  The  image  informa¬ 
tion  alone  would  lead  to  a  shape  that  would  bear  very  little 
resemblance  with  any  of  the  four  words  learnt. 

5.2.  Real  Images  Example:  Tracking  of  challenging 
sequences 

To  test  the  robustness  of  the  framework,  tracking  was 
performed  on  two  challenging  sequences.  A  very  simple 
tracking  scheme  was  used:  the  same  initial  contour  was 
used  for  each  image  in  the  sequence.  This  contour  was 
initially  positioned  wherever  the  final  contour  was  in  the 
preceding  image.  The  coefficients  /3\  and  @2  were  fixed 
throughout  each  sequence.  Of  course,  many  efficient  track¬ 
ing  algorithms  have  already  been  proposed.  However,  con¬ 
vincing  results  were  obtained  here  without  considering  the 
system  dynamics,  for  instance.  This  highlights  the  effi¬ 
ciency  of  including  prior  knowledge  on  shape  for  the  robust 
tracking  of  deformable  objects. 

5.2.1  Soccer  Player  Sequence 

In  this  sequence  (composed  of  130  images),  a  man  is  jin¬ 
gling  with  a  soccer  ball.  The  challenge  is  to  accurately 
capture  the  large  deformations  due  to  the  movement  of  the 
person  (e.g.:  limbs  undergo  large  changes  in  aspect),  while 
sufficiently  constraining  the  contour  to  discard  clutter  in  the 
background.  A  training  set  of  22  silhouettes  (Figure  1,  first 
row)  was  used.  The  version  of  E[mdige  involving  the  inten¬ 
sity  means  only  was  used  to  capture  image  information.  De¬ 
spite  the  small  number  of  shapes  used,  successful  tracking 
was  obtained,  correctly  capturing  the  posture  of  the  player. 

5.2.2  Shark  Video 

In  this  sequence  (composed  of  70  images),  a  shark  is  evolv¬ 
ing  in  a  highly  cluttered  environment.  Besides,  the  shark  is 
oftentimes  occluded  by  other  fish  and  is  poorly  contrasted. 
To  perform  tracking,  15  shapes  were  extracted  from  the  first 
half  of  the  video  (Figure  1,  second  row)  and  used  to  build 
shape  prior.  The  version  of  E7image  involving  the  variances 
was  used  to  make  up  for  the  poor  contrast  of  the  shark  in  the 
images.  Once  again,  despite  the  small  training  set,  success¬ 
ful  tracking  performances  were  observed:  The  shark  was 
correctly  captured,  while  clutter  and  obstacles  rejected. 

6.  Conclusion 

In  this  work,  we  used  Kernel  PCA  to  introduce  prior 
knowledge  about  shapes  in  the  GAC  framework.  Better  per¬ 
formance  of  Kernel  PCA  over  linear  PCA  was  demonstrated 
for  two  representations  of  shapes  (binary  maps  and  SDFs). 


We  also  developed  a  general  approach  to  separate  an  object 
from  the  background  using  various  image  intensity  statis¬ 
tics.  In  our  algorithm,  image  information  and  shape  knowl¬ 
edge  were  combined  in  a  consistent  fashion:  both  energies 
were  expressed  in  terms  of  shapes.  The  proposed  method 
not  only  allowed  to  simultaneously  learn  shapes  of  differ¬ 
ent  objects  but  was  also  robust  to  noise,  blurring,  occlusion 
and  clutter.  In  addition,  even  if  the  same  parameters  and 
same  initial  contour  were  used  for  each  of  the  image  of  the 
sequences,  successful  tracking  was  obtained:  This  further 
highlights  the  robustness  of  the  framework. 
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